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DETAILED ACTION 

Response to Arguments 

1 . Applicant's after-final response, filed September 7th 2005 has been 
entered and made of record. An interview summary is also attached to this action 
detailing the phone discussion between Applicant's representative Kenneth Nigon Reg. 
No. 31 ,549 and Examiners Tucker and Bali, of the after-final response with regard to the 
final rejection previously presented. In light of that interview the after-final request has 
been considered. 

2. Applicant has withdrawn claims 16-27 in response to the previous 
restriction requirement. Claims 1, 3-6 and 12 were previously amended. Claims 1-15 
remain pending. 

3. The arguments presented by Applicant in regard to amended independent 
claim 1 presented in the above-mentioned interview are found persuasive for at least 
the following reasons. Accordingly a new rejection has been presented. The previously 
presented final rejection has been withdrawn and this action is accordingly non-final. 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for 
all obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
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invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 1-2, 5, 8-12, 15 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over the combination of [ChenWilliams93] (S.E. Chen and L. Williams, 
View Interpolation for Image Synthesis, ACM-SIGGRAPH, 1993) and U.S. Patent 
6,469,710 to Shum etal. 

With regard to claim 1 , Chen discloses in a system using a plurality of fixed 
imagers covering a scene, a method to create a high quality virtual image, in real time , 
as seen from a virtual viewpoint of the scene (p. 281, left column, paragraph 1). 

Chen further discloses step a) selecting at least two images corresponding to at 
least two of the plurality of fixed imagers to be used in creating the high quality virtual 
image (Section 2 View Morphing paragraph 1 , sentence 3). Theses images are 
considered captured by a set of corresponding cameras. 

Chen further discloses step b) creating at least two depth maps corresponding to 
the at least two images (Chen, page 280, left column, paragraph 1, lines 4-7 and 
paragraph 2, lines 4-6). Here Chen discloses that range data is generated using 
ranging devices. This is interpreted as depth mapping. 

Chen further discloses step c) determining at least two sets of warp parameters 
using the at least two depth maps corresponding to said at least two images, each set 
of warp parameters corresponding to warping one of the at least two images to the 
virtual viewpoint (page 280, left column, paragraph 1, lines 4-7 and page 281, left 
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column, paragraph 2, lines 4-18). Chen discloses determining morph maps wherein the 
two images are morphed to one another and the morphed images are interpolated or in 
a way merged to form an image corresponding to a virtual viewpoint in between the 
viewpoints used to capture the images. 

Chen does not disclose the feature of step d) wherein the two generated warped 
images are each warped to an image representing the virtual viewpoint. The difference 
in Chen and the presently claimed invention is that Chen uses two warped images 
warped from one view point to the other and then the two warped images are used to 
interpolate an image representing a virtual viewpoint (Section 2, View Morphing, 
paragraph 2, lines 5-7), while the presently claimed invention generates two warped 
image representing a virtual viewpoint and then merges the two warped images. The 
reference of U.S. Patent 6,469,710 to Shum et al. is cited to teach this feature. Shum 
teaches using 3D model and multiple images from multiple viewpoints to be used 
(column 6, lines 50-55) to create an image by combining several warped images 
(column 3, lines 1-30 and Fig. 1). 

Chen discloses the step e) of merging the at least two warped images to create 
the high quality virtual image.(column 2, lines 50-67). Chen teaches that the images are 
warped to a view point and then the warped images are weighted and combined to 
produce a final virtual viewpoint image in a blending process making it possible to fill in 
spaces that were occluded in one warped image and not in another (column 2, lines 50- 
67). Therefore it would have been obvious to one of ordinary skill in the art at the time 
of invention to use the warping and mapping process of as taught by Shum in order to 
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create a better image corresponding to a virtual viewpoint in combination with the image 
warping taught by Chen. 

With regard to claim 12, the discussion of claim 1 applies. 

With regard to claim 2 Chen discloses the step of selecting the virtual viewpoint 
by an operator (p.279, right column, lines 1-2). 

The following is in regard to Claim 5 and 15. The range data (depth map) 
associated with each of the input images can be obtained according a variety of 
different techniques. One method suggested by [ChenWilliams93] is to obtain the range 
data using ranging sensors ([ChenWilliams93] page 280, left column, paragraph 2, lines 
4-6). Though not explicitly disclosed in [ChenWilliams93], the following are clearly 
inherent aspects of such a configuration are clearly inherent: 

(5.a.) Mounting the depth (ranging) sensors viewing the scene coincident 
with the fixed imagers. It is typically assumed that each pixel of an 
image is associated with a visible point in the three-dimensional 
space of a given scene. Each pixel is thus associated with a 
particular depth - the depth of the scene point Generally, this 
depth is measured relative to the center-of-projection (COP) of the 
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corresponding imager or viewpoint 1 . Therefore, if the aim is to 
generate range data from the COP of an imager to the viewed 
scene using depth sensors, then it is necessary that the depth 
sensors be mounted in close proximity (coincident, if feasible) to 
the location of the imager. 
(5.b.) Selecting at least two depth sensors corresponding to the images. 
(5.c.) Measuring a plurality of depth values (this is what depth sensor do!) 
with the depth sensors. As stated above, the depth values are 
required for each pixel (i.e. "the plurality of image coordinates") of 
the given images to determine the aforesaid pixel-to-pixel 
correspondences. See steps (1.b.)-(1.c.) above. 
(5.d.) As stated above, a depth map (range data) is obtained for each of 
the input images. See steps (1.b.)-(1.c.) above. Clearly, in a 
configuration that utilizes depth sensors, these depth maps would 
consist of the measured depth values. 
It has thus been shown that an implementation of [ChenWilliams93], which utilizes 
ranging sensors to derive the range data of the given images, inherently comprises all 
substantive elements as set forth in Claim 5. 

With regard to claim 15, the discussion of claim 5 applies. 



i 



This, of course, assumes a pinhole camera model. This assumption is made by both the Applicant and [ChenWilliams93]. A 
pinhole camera is, for all intents and purposes, located at its center-of-projection. 
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The following is in regard to Claims 8-11. [ChenWilliams93] can synthesize novel 
views from images acquired at multiple viewpoints. [ChenWilliams93], therefore, 
supports multiple cameras (imagers). The authors pose no limit on the number of input 
images, other than there be at least two. Indeed, [ChenWilliams93] describes view 
interpolation primarily within the context of a two camera/two input image system (e.g. 
[ChenWilliams93], page 281, left column, paragraph 1) - that is, a system where exactly 
two images are selected. Also, a three camera system (a system where exactly three 
images are selected) is illustrated in Fig. 7 of [ChenWilliams93]. 

Assuming a three camera system (a system where exactly three images are 
selected ), Fig. 7 of [ChenWilliams93] clearly shows exactly three images that 
correspond to three fixed imagers (e.g. Viewl , View2, and View3) arranged in a 
triangular fashion. This configuration is, of course, a geometric pattern of fixed imagers. 



6. Claim 3 is rejected under 35 U.S.C. § 103(a) as being unpatentable over 
the combination of [ChenWiIliams93] and U.S. Patent 6,469,710 to Shum et al. and 
further in view of [Faugeras95] (0. Faugeras et al., 3-D Reconstruction of Urban Scenes 
from Sequences of Images, INRIA, 1995). 
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. The following is in regard to Claim 3. As shown above, [ChenWilliams93] and 
Shum satisfy all limitations of Claim 1 and therefore meets the limitations including the 
steps b-f in Claim 3 which repeat respective steps a-e in Claim 1 . However, 
[ChenWilliams93] and Shum do not disclose selecting the virtual viewpoint based on 
tracking at least one feature in the scene. 

[Faugeras95] discloses a method to reconstruct a three-dimensional model of a 
static environment viewed by one or several cameras whose motions or relative 
positions are unknown and whose intrinsic parameters are also unknown and may vary 
([Faugeras95], Introduction, paragraph 1). The problem solved by [Faugeras95], though 
more in the realm of image-based modeling, is nonetheless similar to that of 
[ChenWilliams93]. [Faugeras95] suggests tracking a set of feature points through a 
given sequence of images. If a given feature point can be tracked all the way between 
two of the given views, a correspondence is established between those views. In this 
manner, a subset of the given set of images are used to establish feature 
correspondences between images. See [Faugeras95] Section 2 Robust Recovery of the 
Geometry, paragraph 1, sentence 1 and Section 2.1, paragraph 3, sentences 1-2. 

Given the teachings of [Faugeras95], it would have been obvious to one of 
ordinary skill in the art, at the time of the Applicant's claimed invention, to select a 
subset of the given images in [ChenWilliams93] based on whether those images contain 
a set of tracked feature points. The advantages of such a modification are (at least) 
twofold. First, the resultant methodology would be capable of synthesizing novel views 
of designated feature(s) in the observed scene. Secondly, correspondences (and, 



Application/Control Number: 09/978,158 Page 9 

Art Unit: 2623 

presumably, all subsequent steps) are derived only for the reserved frames. As a result, 
the computational burden is reduced. 

7. Claim 4 is rejected under 35 U.S.C. § 103(a) as being unpatentable over 
the combination of [ChenWilliams93] and U.S. Patent 6,469,710 to Shum et al. and 
further in view of [Trucco98] (E. Trucco and A. Verri, Introductory Techniques for 
Computer Vision © 1998, Prentice-Hall, Chapters 7-8). 

The following is in regard to Claim 4. As shown above, [ChenWilliams93] and 
Shum satisfy all limitations of Claim 1 . [ChenWilliams93] further suggests determining 
the depth maps associated with each of the given images using photogrammetric 
techniques ([ChenWilliams93] page 280, left column, paragraph 2, lines 4-6). Although 
these techniques are well-known, [ChenWilliams93] does not propose using any 
particular photogrammetric technique. 

Generally speaking, photogrammetry is the study in which the three-dimensional 
coordinates of points on an object are determined by measurements made in two or 
more photographic images taken from different positions. The problem of stereo vision 
belongs to the field of photogrammetry. The essence of stereo vision lies in solving the 
stereo correspondence problem ([Trucco98] Section 7.1.1, paragraph 1). 

As shown in [Trucco98] ([Trucco98] Section 7.1.1, paragraph 2, lines 1-6), the 
disparity map represents a solution of the stereo correspondence problem, assuming 
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the geometry of the stereo system is known 2 As stated previously, disparity is inversely 
proportional to depth. See also [10100098], page 144. The disparity map and depth map 
are, therefore, trivially related. Given the suggestion of [ChenWilliams93] to use 
photogrammetry to derive the depth maps, the teachings of [Trucco98] with regard to 
such a method, and the fact that [ChenWilliams93] presupposes a priori knowledge of 
the intrinsic and extrinsic camera parameters 3 ([Trucco98] page 144: Parameters of a 
Stereo System), it would have been obvious to one of ordinary skill in the art, at the time 
of the Applicant's claimed invention, to derive the depth (disparity) maps via stereo 
correspondence. 

Under certain constraints, it can be shown that the optical /tow 3 between a set of 
images and the disparity (hence, depth) are approximations of one another. To illustrate 
this, the notion of a motion field is introduced. The motion field is the two-dimensional 
vector field of velocities of the image points, induced by the relative motion between the 
viewing camera and the observed scene ([Trucco98], page 183). This relative motion 
may manifest itself as the viewing camera moving about a static scene. For static 
scenes, movement of the camera about the scene is equivalent to capturing the scene 
from a plurality of fixed cameras located at discrete locations along the path of the 
camera. The derivation of the motion field induced by a camera moving relative to a 
static scene is thus conceptually similar to the stereo correspondence problem for pairs 
of cameras fixed along the path of the moving camera. Indeed, the motion field 

2 This is key assumption in [ChenWilliams93]. See [ChenWilliams93], Section 2. 1, paragraph 1, sentence 3. 

3 The optical flow is defined as the apparent motion of the image brightness pattern. 
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coincides with the stereo disparity map when spatial and temporal differences between 
frames are sufficiently small ([Trucco98], page 185: Stereo Disparity Map and Motion 
Field). Returning to the discussion of optical flow, [Trucco98] points out that, if one 
assumes a globally illuminated scene of Lambertian (diffusive) surfaces, then optical 
flow is an approximation of the motion field ([Trucco98], page 195: Optical Flow and 
Motion Field). Taking into account the previous observations, the following can be 
concluded. If a set of input images depicts a globally illuminated scene of Lambertian 
(diffusive) surfaces, from a corresponding set of tightly spaced and spatially coherent 
viewpoints, then the disparity map and optical flow field are approximations of one 
another 4,5 . These observations imply that, under the first and second constraints, the 
derivation of the disparity (depth) maps, using photogrammetric methods, involves: 

(4.a.) Calculating a plurality of optical flow values (disparity) between the 
set of input images. 

The disparity of an image pixel is actually the parallax 6 caused by viewing the 
corresponding scene point from different viewpoints. Disparity in image pairs is often 
referred to as binocular parallax. Thus, in calculating the disparity of an image pixel, one 
has also calculated a parallax value associated with the pixel. Given this observation, 
the derivation of the disparity (depth) maps further includes: 

4 Note that the images are given and can be presumed to have been captured simultaneously. In this case, the temporal difference 
between images is negligible. 

3 For the sake of brevity, the constraint of small spatial and temporal differences between frames will be referred to as the 'Tirst 
constraint"; and the "second constraint" will refer to the assumption of global illumination and Lambertian (diffusive) 
reflectivity for all scene surfaces. 
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Calculating a plurality of parallax values (disparity) corresponding 
(4.b.) to pixels (i.e. a plurality of image coordinates) in the given input 

images from optical flow values (disparity). 
[ChenWilliams93] satisfies both the first constraint ([ChenWilliams93] page 280, 
left column, paragraph 1, sentence 1) and the second constraint ([ChenWilliams93] 
page 280, right column, paragraph 1, lines 8-12). Therefore, the derivation of the depth 
maps implies steps (4.a.)-(4.b.) above and, thus, the step of: 

Calculating the depth (disparity) maps using the image pixels and 
(4.c.) the parallax (disparity) values. 
That is, steps (4.a.)-(4.c.) are implicit to the calculation of the depth maps by stereo 
reconstruction in the method of [ChenWilliams93]. 

8. Claim 7 is rejected under 35 U.S.C. § 103(a) as being unpatentable over 
the combination of [ChenWilliams93] and U.S. Patent 6,469,710 to Shum et al. and 
further in view of [RoginaOl] [U.S. Patent Application Publication 2001/0043737, 
assigned to Rogina et al.). 

The following is in regard to Claim 7. As shown above, [ChenWilliams93] 
satisfies all limitations of Claim 1. However, [ChenWilliams93] does not disclose 



Parallax is the apparent displacement or the difference in apparent location of an object as seen from two different viewpoints 
not on a straight line with the object. 
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selecting the input images based on a proximity of the virtual viewpoint to the viewpoints 
corresponding to the input images. 

[RoginaOl] discloses a method of providing an image from an arbitrary virtual 
viewpoint. In that method, a plurality of discrete two-dimensional images are acquired, 
each corresponding to the image of a scene observed from a plurality of discrete - 
viewpoints on a predetermined viewpoint locus ([RoginaOl] column 2, paragraph [001 1], 
sentences 1-2; see also Fig. 1). In a process analogous to [ChenWilliams93], 
[RoginaOl] uses an input viewpoint (base viewpoint) to map from transform images into 
the virtual viewpoint image ([RoginaOl] column 2, paragraph [001 1], last sentence). The 
base viewpoint is selected from the discrete viewpoint locus. According to [RoginaOl], it 
is desirable to selected a base viewpoint close to the virtual viewpoint. See [RoginaOl] 
column 2, paragraph [0011], sentences 5-6. Note that [RoginaOl] also uses adjacent 
viewpoints in the view interpolation ([RoginaOl], Abstract, lines 10-14). It that sense, the 
selection of the base viewpoint entails a selection of additional adjacent view points 
(which should also be close to the virtual viewpoint) - that is, at least two proximate 
images are selected. 

It would have been obvious to one of ordinary skill in the art, at the time of the 
Applicant's claimed invention, incorporate this simple selection process into 
[ChenWilliams93]. According to [RoginaOl], selecting the viewpoints closest to the 
virtual viewpoint alleviates skewing and accurately reflects occlusions of distant objects 
by close objects ([RoginaOl] column 13, paragraph [0102]). 
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9. Claim 6 and 14 are rejected under 35 U.S.C. § 103(a) as being 
unpatentable over the combination of [ChenWilliams93] and U.S. Patent 6,469,710 to 
Shum et al., in view of [LuoMaitre90] (W. Luo and H. Maitre, Using Surface Model to 
Correct and Fit Disparity Data in Stereo Vision, IEEE, 1 990). 

The following is in regard to Claim 6. As shown above, [ChenWilliams93] 
satisfies all limitations of Claim 1. However, [ChenWilliams93] does not create the 
aforementioned depth (disparity) maps by: 

(6. a.) Separating the given set of images into a plurality of segments, 
wherein pixels of each segment have substantially homogenous 
values. 

(6.b.) Calculating a depth value corresponding to each segment. 
(6.c.) Optimizing the depth values corresponding to each segment. 
(6.d.) Creating the aforementioned depth maps from the plurality of 

optimized depth values 
[LuoMaitre90] disclose a method for stereo reconstruction 7 ([LuoMaitre90] 
Abstract) comprising the steps of: 

(6, a.) The images are segmented into regions of substantially uniform 

values (gray values). See [LuoMaitre90] Section 3.1, item (b) and 

Abstract, sentence 3. 



7 



Recall from above that stereo reconstruction yields a disparity or depth map associated with a given image. 
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(6.b.) The depth value (disparity) of each segment is calculated. See 

[LuoMaitre90] Section 3.1, item (a) and second to last paragraph, 
sentence 2. 

(6.c.) The disparities of each segment (referred, henceforth, to as the 
disparity map of a segment) are optimized by the following: 

1 . Fitting a plane to the disparity map of each segment. See 
[LuoMaitre90] Section 3.1, second to last paragraph, 
sentence 4. 

2. The goodness-of-fit is determined. See [LuoMaitre90], 
Section 3. 1 , last paragraph. 

3. Errors are corrected ([LuoMa?tre90], Section 3.1 , last 
paragraph, last sentence and Section 3.2). 

4. If the fit is still unacceptable the segment is subdivided. See 
Section 3.3 of [LuoMaitre90]. 

(6.d.) If the fitted planar model is acceptable for a given segment , it is fit 
to the measured disparity map. The fitted plane then becomes the 
"optimized" disparity map for the given segment. See [LuoMaitre90] 
Section 3.4. This is clearly done for all segments in each of the 
input images so as to obtain a complete disparity (depth) map for 
each of the images. 

The primary advantage of [LuoMaitre90] is that fitted plane can provide a dense 
set of disparity values (depths) from a sparse set of measured disparities. Furthermore, 
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as a mathematical model, the fitted plane has sub-pixel resolution. Taking this into 
account, it would have been obvious to one of ordinary skill in the art, at the time of the 
Applicant's claimed invention, to derive the depth (disparity) maps of [ChenWilliams93] 
according to the teachings of [LuoMaftre90]. 

The following is in regard to Claim 14. As shown above, [ChenWilliams93] 
satisfies all limitations of Claim 12. As just discussed with respect to Claim 6, 
[LuoMaUre90] is a segmentation-based method for disparity (depth) calculation. Note 
that the brightness value is never used in [LuoMaltre90], aside from its use in evaluating 
the homogeneity of image regions. Therefore, it would have been obvious to one of 
ordinary skill in the art, at the time of the Applicant's claimed invention, to combine 
[LuoMaitre90] and [ChenWilliams93], in the manner suggested above, and further 
extend [LuoMaitre90] to accommodate color images. 

10. Claim 13 is rejected under 35 U.S.C. § 103(a) as being unpatentable over 
the combination of [ChenWilliams93] and U.S. Patent 6,469,710 to Shum et al., in view 
of [Saito99] (H. Saito et al., Appearance-Based Virtual View Generation of Temporally- 
Varying Events from Multi-Camera Images in the 3D Room, IEEE, 1999). 
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The following is in regard to Claim 13. As shown above, [ChenWilliams93] 
satisfies all limitations of Claim 12. [ChenWilliams93], however, does not disclose using 
a view-based volumetric mapping means to create depth maps of the images. 

[Saito99] proposes an "appearance [view]-based" virtual view generation method 
([Saito99] Abstract). Depth images are derived for each camera using a multi-baseline 
stereo methodology ([Saito99] Section 4.1, paragraph 1). These depth images are 
merged to form a three-dimensional volumetric model ([Saito99] Section 4.1, paragraph 
2). Using the volumetric model to resolve occlusions, [Saito99] derive a disparity (depth) 
map for each of the input views ([Saito99] Section 4.2 and Fig. 7). Clearly, in this sense, 
[Saito99] represents a view-based volumetric mapping means for creating depth 
(disparity) maps. 

This volumetric process is superior because it successfully resolves occluded 
regions in all of the given views ([Saito99], Abstract, sentences 4-5). Therefore, it would 
have been obvious to one of ordinary skill in the art, at the time of the Applicants 
claimed invention, to use the method of [Saito99] to create depth images for each of the 
input images of [ChenWilliams93]. 

1 1 . Any inquiry concerning this communication or earlier communications from 
the examiner should be directed to Wes Tucker whose telephone number is 571-272- 
7427. The examiner can normally be reached on 9AM-5PM. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Jingge Wu can be reached on 571-272-7429. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 



Wes Tucker 
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